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£SJ . We develop a new method for bounding the relative entropy of a random vector in 

terms of its Stein factors. Our approach is based on a novel representation for the score 
Oh function of smoothly perturbed random variables, as well as on the de Bruijn's identity 

-^ of information theory. When applied to sequences of functionals of a general Gaussian 

field, our results can be combined with the Carbery- Wright inequality in order to yield 
multidimensional entropic rates of convergence that coincide, up to a logarithmic fac- 
tor, with those achievable in smooth distances (such as the 1-Wasserstein distance). In 
particular, our findings settle the open problem of proving a quantitative version of the 

Mh . multidimensional fourth moment theorem for random vectors having chaotic components, 

with explicit rates of convergence in total variation that are independent of the order 
of the associated Wiener chaoses. The results proved in the present paper are outside 
the scope of other existing techniques, such as for instance the multidimensional Stein's 
method for normal approximations. 
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1 Introduction 

1.1 Overview and motivation 

The aim of this paper is to develop a new method for controlling the relative entropy of 
a general random vector with values in K d , and then to apply this technique to settle 
a number of open questions concerning central limit theorems (CLTs) on a Gaussian 
space. Our approach is based on a fine analysis of the (multidimensional) de Bruijn's 
formula, which provides a neat representation of the derivative of the relative entropy 
(along the Ornstein-Uhlenbeck semigroup) in terms of the Fisher information of some 
perturbed random vector - see e.g. [TJ HI [SJ [TH |5T]. The main tool developed in this 
paper (see Theorem 12.101 as well as relation p.l4p ) is a new powerful representation of 
relative entropies in terms of Stein factors. Roughly speaking, Stein factors are random 
variables verifying a generalised integration by parts formula (see (|2.43[) below): these 
objects naturally appear in the context of the multidimensional Stein's method for normal 
approximations (see e.g. |15U33| ). and implicitly play a crucial role in many probabilistic 
limit theorems on Gaussian or other spaces (see e.g. [33J Chapter 6], as well as [3T1I3HI37] ). 

The study of the classical CLT for sums of independent random elements by entropic 
methods dates back to Linnik's seminal paper |28j . Among the many fundamental con- 
tributions to this line of research, we cite [3J [3J [5J [301 OH EH [HI HI] (see the monograph 
P32 for more details on the history of the theory). All these influential works revolve 
around a deep analysis of the effect of analytic convolution on the creation of entropy: in 
this respect, a particularly powerful tool are the 'entropy jump inequalities' proved and 
exploited e.g. in [3J O [501 E] • As discussed e.g. in [S], entropy jump inequalities are di- 
rectly connected with challenging open questions in convex geometry, like for instance the 
Hyperplane and KLS conjectures. One of the common traits of all the above references is 
that they develop tools to control the Fisher information and use the aforementioned de 
Bruijn's formula to translate the bounds so obtained into bounds on the relative entropy. 

One of the main motivations of the present paper is to initiate a systematic information- 
theoretical analysis of a large class of CLTs that has emerged in recent years in connection 
with different branches of modern stochastic analysis. These limit theorems typically in- 
volve: (a) an underlying infinite dimensional Gaussian field G (like for instance a Wiener 
process), (b) a sequence of rescaled centered random vectors F n = F„(G), n ^ 1, hav- 
ing the form of some highly non-linear functional of the field G. For example, each 
F n may be defined as some collection of polynomial transformations of G, possibly de- 
pending on a parameter that is integrated with respect to a deterministic measure (but 
much more general forms are possible). Objects of this type naturally appear e.g. in the 
high-frequency analysis of random fields on homogeneous spaces [35], fractional processes 
[301 135] , Gaussian polymers [35] , or random matrices [T3J [35] . 

In view of their intricate structure, it is in general not possible to meaningfully rep- 
resent the vectors F n in terms of some linear transformation of independent (or weakly 
dependent) vectors, so that the usual analytical techniques based on stochastic indepen- 
dence and convolution (or mixing) cannot be applied. To overcome these difficulties, a 
recently developed line of research (see [33] for an introduction) has revealed that, by us- 
ing tools from infinite-dimensional Gaussian analysis (e.g. the so-called Malliavin calculus 



of variations - see [38] ) and under some regularity assumptions on F n , one can control 
the distance between the distribution of F n and that of some Gaussian target by means 
of quantities that are no more complex than the fourth moment of F n . The regularity 
assumptions on F n are usually expressed in terms of the projections of each F n on the 
eigenspaces of the Ornstein-Uhlenbeck semigroup associated with G [551 158] . 

In this area, the most prominent contribution is arguably the following one-dimensional 
inequality established by the first two authors (see, e.g., [551 Theorem 5.2.6]): let / be 
the density of a random variable F, assume that L x 2 f(x)dx = 1, and that F belongs to 
the qth eigenspace of the Ornstein-Uhlenbeck semigroup of G (customarily called the gth 
Wiener chaos of G), then 




where <j>\(x) — (27r) -1 / 2 cxp(— x 2 /2) is the standard Gaussian density (one can prove that 
J R x 4 (f(x) — 4>\(x))dx > for / as above). Note that f R x 4 (/)i(x)dx = 3. A standard use 
of hypercontractivity therefore allows one to deduce the so-called fourth moment theorem 
established in [40] : for a rescaled sequence {F n } of random variables living inside the qth 
Wiener chaos of G. one has that F n converges in distribution to a standard Gaussian 
random variable if and only if the fourth moment of F„ converges to 3 (and in this case 
the convergence is in the sense of total variation). See also [39] . 

The quantity \/ J R x i (f(x) — <f>i{x))dx appearing in (|l.ip is often called the kurtosis of 
the density /: it provides a rough measure of the discrepancy between the 'fatness' of the 
tails of / and </>i . The systematic emergence of the normal distribution from the reduction 
of kurtosis, in such a general collection of probabilistic models, is a new phenomenon that 
we barely begin to understand. A detailed discussion of these results can be found in [331 
Chapters 5 and 6]. M. Ledoux ^3 has recently proved a striking extension of the fourth 
moment theorem to random variables living in the eigenspaces associated with a general 
Markov operator, whereas references [171122] contain similar statements in the framework 
of free probability. 

The estimate (jl.l[) is obtained by combining the Malliavin calculus of variations with 
the Stein's method for normal approximations [151 133] . Stein's method can be roughly 
described as a collection of analytical techniques, allowing one to measure the distance 
between random elements by controlling the regularity of the solutions to some specific 
ordinary (in dimension 1) or partial (in higher dimensions) differential equations. The 
needed estimates are often expressed in terms of the same Stein factors that lie at the 
core of the present paper (see Section 12.31 for definitions). It is important to notice 
that the strength of these techniques significantly breaks down when dealing with normal 
approximations in dimension strictly greater than 1. For instance, in view of the structure 
of the associated PDEs, for the time being there is no way to directly use Stein's method in 
order to obtain bounds in the multidimensional total variation distance (see JT3H35J). In 
contrast, the results of this paper allow one to deduce a number of information-theoretical 
generalisations of (jl.ip that are valid in any dimension. It is somehow remarkable that our 
techniques make a pervasive use of Stein factors, without ever applying Stein's method. 
As an illustration, we present here a multidimensional entropic fourth moment bound that 
will be proved in full generality in Section 0] For d Js 1, we write <^<z(x) = 4>d(xi, •■-, Xd) to 
indicate the Gaussian density (27r)~ d/ ' 2 exp(— (x\ + • • • + x 2 d )/2), {x\, ...,Xd) € R d - From 
now on, every random object is assumed to be defined on a common probability space 
(Q, J-", P), with E denoting expectation with respect to P. 

Theorem 1.1 (Entropic fourth moment bound) Let F n = (-Fi, n , ...,Fd, n ) be a se- 
quence of d-dimensional random vectors such that: (i) Fi >n belongs to the qith Wiener 



chaos of G, with 1 ^ q± ^ q2 ^ ■ ■ ■ ^ qd', (ii) each Fi y7l has variance 1, (iii) E[F^ n Fj^ 
for i j^ j, and (iv) the law of F n admits a density f n on M. d . Write 



A„:= / ||x|| 4 (/„(x)-^(x))dx, 

JTg. d 



where \\ ■ \\ stands for the Euclidean norm, and assume that A„ — > 0, as n — >• oo. Then, 
/„(x)log^4dx = 0(l)A„|logA„|, (1.2) 

04 X ) 

where 0(1) stands for a bounded numerical sequence, depending on d, q\, ..., qd and on the 
sequence {F n }. 

As in the one-dimensional case, one has always that A n > for /„ as in the previous 
statement. The quantity of the left-hand-side of (|1.2|) equals of course the relative entropy 
of f n . In view of the Csiszar-Kullback-Pinsker inequality (see [TBI I23T |4"2] ) . according to 
which 



/ n (x) log -y^dx ^ - / / n (x) - c/) d (x) dx , (1.3) 

d (x) 2 \J Rd J 

relation (|1.2p then translates in a bound on the square of the total variation distance 
between /„ and (j)d, where the dependence in A n hinges on the order of the chaoses only 
via a multiplicative constant. This bound agrees up to a logarithmic factor with the 
estimates in smoother distances established in |37j (see also |35|). where it is proved that 
there exists a constant Kq = Ko(d,qx, --^qd) such that 

where Wi stands for the usual Wasserstein distance of order 1. Relation (|1.2p also dras- 
tically improves the bounds that can be deduced from [31] , yielding that, as n — > oo, 



f /„(x)-^(x)dx = 0(l)A^, 

JR d 



where ad is any strictly positive number verifying ad < i+(d+i)(3+4d< — ^TTT > anc ^ ^ e s y m " 
bol O(l) stands again for some bounded numerical sequence. The estimate (|1.2p seems 
to be largely outside the scope of any other available technique. Our results will also 
show that convergence in relative entropy is a necessary and sufficient condition for CLTs 
involving random vectors whose components live in a fixed Wiener chaos. As in [311 136] . 
an important tool for establishing our main results is the Carbery- Wright inequality |11| , 
providing estimates on the small ball probabilities associated with polynomial transforma- 
tions of Gaussian vectors. Observe also that, via the Talagrand's transport inequality |48| . 
our bounds trivially provide estimates on the 2- Wasserstein distance Wa(/ n , <t>d) between 
f n and 4>d, for every d ^ 1. 

We stress that, albeit our principal motivation comes from asymptotic problems on a 
Gaussian space, the methods developed in Section [5] are general. In fact, at the heart of 
the present work lie the powerful equivalences (|2.45[) - (|2.46p (which can be considered as a 
new form of so-called Stein identities) that are valid under very weak assumptions on the 
target density; it is also easy to uncover a wide variety of extensions and generalizations 
so that we expect that our tools can be adapted to deal with a much wider class of 
multidimensional distributions. 

The connection between Stein identities and information theory has already been 
noted in the literature (although only in dimension 1). For instance, explicit applications 



are known in the context of Poisson and compound Poisson approximations 145] , and 
recently several promising identities have been discovered for some discrete [26, 44J as well 
as continuous distributions [24l [27J [4Tj . However, with the exception of [27], the existing 
literature seems to be silent about any connection between entropic CLTs and Stein's 
identities for normal approximations. To the best of our knowledge, together with |27) 
(which however focusses on bounds of a completely different nature) the present paper 
contains the first relevant study of the relations between the two topics. 

Remark on notation. Given random vectors X, Y with values in M. d (d ^ f) and 

densities fx,fy, respectively, we shall denote by TV(/x,/r) and Wi(/^,/y) the total 
variation and f- Wasserstein distances between fx and fy (and thus between the laws of 
X and Y). Recall that we have the representations 

TV(/ x ,/y)= sup \p[X e A] - P[Y e A] 

E[h{X)]-E[h(Y)] (1.4) 
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\f /x(x)-/y(x) 



rfx=: ^f x ~/Hli) 



where (here and throughout the paper) dx is shorthand for the Lebesgue measure on R d , 
as well as 

Wi(fx,f Y )= sup E[h(X)}-E[h(Y)} 

/iGLip(l) 

In order to simplify the discussion, we shall sometimes use the shorthand notation 

TV(X,Y) = TV(/ x ,/y) and Wi(Jf,Y) = Wi(/x,/y). 

It is a well-known fact the the topologies induced by TV and Wi, over the class of 
probability measures on R d , are strictly stronger than the topology of convergence in 
distribution (see e.g. [T51 Chapter 11] or [33J Appendix C]). Finally, we agree that every 
logarithm in the paper has base e. 

To enhance the readability of the text, the next Subsection 11.21 contains an intuitive 
description of our method in dimension one. 

1.2 Illustration of the method in dimension one 

Let F be a random variable with density / : R — >• [0, oo), and let Z be a standard Gaussian 
random variable with density (f)\. We shall assume that E[F] = and E[F 2 } = 1, and that 
Z and F are stochastically independent. As anticipated, we are interested in bounding 
the relative entropy of F (with respect to Z) , which is given by the quantity 



D(F\\Z)= / f(x)log(f(x)/Mx))dx. 

JR 

Recall also that, in view of the Pinsker-Csiszar-Kullback inequality, one has that 



2TV(f,<l> 1 )^y/2D(F\\Z). (1.5) 

Our aim is to deduce a bound on D(F\\Z) that is expressed in terms of the so-called 
Stein factor associated with F. Whenever it exists, such a factor is a mapping Tp : R — > K 
that is uniquely determined (up to negligible sets) by requiring that tf(F) G L 1 and 

E[Fg(F)]=E[T F (F)g'(F)} 



for every smooth test function g. Specifying g(x) = x implies, in particular, that 
E[tf(F)] = E[F 2 ] = 1. It is easily seen that, under standard regularity assumptions, 
a version of Tp is given by Tp(x) = (f(x))^ 1 J zf[z)dz, for x in the support of / (in 
particular, the Stein factor of Z is 1). The relevance of the factor Tp in comparing F with 
Z is actually revealed by the following Stein's bound [151 133) , which is one of the staples 
of Stein's method: 



TV(/, </>!) = sup \E[g\F)} - E[Fg(F)]\, 



(1.6) 



where the supremum runs over all continuously differentiable functions g : M — > R satis- 
fying j|g||oo ^ \/2/7r and Hff'Hoo ^ 2. In particular, from Q1.6P one recovers the bound in 
total variation 



TV(f,<f >1 )^2E[\l-T F (F)\], 



(1.7) 



providing a formal meaning to the intuitive fact that the distributions of F and Z are 
close whenever tf is close to tz, that is, whenever Tp is close to 1. To motivate the reader, 
we shall now present a simple illustration of how the estimate (|1.7p applies to the usual 
CLT. 



Example 1.2 Let {Fi : i ^ 1} be a sequence of i.i.d. copies of F, set S n 



-1/2 



EIli^ 



and assume that E[rp(F) ] < +oo (a simple sufficient condition for this to hold is e.g. 
that f has compact support, and f is bounded from below inside its support). Then, using 
e.g. \J^1\ Lemma 2], 



T S n ( S n) 



-E 



J2MK 



.»=i 



Since (by definition) E [ts„ (S n )] = E [tf(-Fj)] = 1 for all i = 1, . . . , n we get 



E[(l-T Sn (S n )) 2 ] =E 




-I \ 2 



F 



= ^ VM L(! - T ^^)) = n 



(1.9) 



In particular, writing f n for the density of S n , we deduce from (jl.7p that 



TV(/„ 



< 2 



VaxiTFiF)) 1 ' 2 



(1.10) 



We shall demonstrate in Section [3] and Section @] that the quantity E[\l — Tp(F)\] 
(as well as its multidimensional generalisations) can be explicitly controlled whenever F 
is a smooth functional of a Gaussian field. In view of these observations, the following 
question is therefore natural: can one bound D(F\\Z) by an expression analogous to the 
right-hand-side of (|1.7[) ? 

Our strategy for connecting Tp(F) and D(F\\Z) is based on an integral version of the 
classical de Bruijn's formula of information theory. To introduce this result, for t e [0, 1] 
denote by f t the density of F t = \/tF + a/1 — tZ, in such a way that f\ = f and fo = 4>i- 



Of course f t (x) = E [fa ((a; - s/tF) / s/T^T)] /y/T^t has support R and is C°° for all 
t < 1. We shall denote by p t = (log/t)' the score function of F t (which is, by virtue of 
the preceding remark, well defined at all £ < 1 irrespective of the properties of F). For 
every t < 1, the mapping p t is completely characterised by the fact that 

E[g'(F t )] = -E[g(F t )p t (F t )} (1.11) 

for every smooth test function g. We also write, for t <G [0, 1), 

J{F t )=E[ Pt {F t Y]= I '^TT^ 

for the Fisher information of F t , and we observe that 

< E[(F t + p t (F t )) 2 ] = J(F t ) - 1 =: J s t(F t ), 

where J s t(Ft) is the so-called standardised Fisher information of F t (note that J s t{Fo) = 
J s t{Z) = 0). With this notation in mind, de Bruijn's formula (in an integral and rescaled 
version due to Barron [5]) reads 

D(F\\Z) = jf 1 ^|^ dt = jf ^* ^ L12 ) 

(see Lemma 12.31 below for a multidimensional statement). 

Remark 1.3 Using the standard relation J s t{Ft) ^ tJ st (F) + (1 — t)J st (Z) = tJ st (F) 
(see e.g. ]19\. Lemma 1.21]), we deduce the upper bound 

D(F\\Z) ^ \j st {F), (1.13) 

a result which is often proved by using entropy power inequalities (see also Shimizu \4&j ) ■ 
Formula (|1.13p is a quantitative counterpart to the intuitive fact that the distributions 
of F and Z are close, whenever J st (F) is close to zero. Using (jT75j) we further deduce 
that closeness between the Fisher informations of F and Z (i.e. J st (F) s=s 0) or between 
the entropies of F and Z (i.e. D(F\\Z) pa 0) both imply closeness in terms of the total 
variation distance, and hence in terms of many more probability metrics. This observation 
lies at the heart of the approach from J31 \1(A \2U] where a fine analysis of the behavior of 
Pf{F) over convolutions (through projection inequalities in the spirit of (|1.8p ) is used to 
provide explicit bounds on the Fisher information distance which in turn are transformed, 
by means of de Bruijn's identity (|1.12p . into bounds on the relative entropy. We will see 
in Section R] that the bound (|1 . 13|) is too crude to be of use in the applications we are 
interested in. 

Our key result in dimension 1 is the following statement (see Theorem 12.101 for a 
general multidimensional version), providing a new representation of relative entropy in 
terms of Stein factors. From now on, we denote by C\ the class of all functions g : R — > M. 
that are continuously differentiable and with compact support. 

Proposition 1.4 Let the previous notation prevail. We have 

D{F\\Z) = \ £ J- t E[E[Z{l-T F {F))\F t f]dt. (1.14) 



Proof. Using pz(Z) = — Z we see that, for any function j e C,. 1 , one has 
E[Z(1 - r F (F))g(VtF + s/T~tZ)\ 
= VT~tE[(l - T F (F))g'(VtF + VT~tZ)} 



= VT~t{E[g'(F t )} - —E[Fg(F t )}} 

= ^El{E[g'{F t )] - E[F t g(Ft)}} = -^f± E[(p t (F t ) + F t )g(F t )}, 
yielding the representation 

Pt(F t ) + F t = --1=E\Z{1 - r F (F))\F t ]. (1.15) 

This implies 

J(F t ) 1 = E[( Pt (F t ) + F t ) 2 } = JL-E[E[Z(1 r F (F))\F t } 2 } , 
and the desired conclusion follows from de Bruijn's identity (|1.12|) . ■ 

To properly control the integral on the right-hand-side of (|1.14p , we need to deal with 
the fact that the mapping t H> j^-r is not integrable in t = 1, so that we cannot directly 
apply the estimate E[E[Z(1 — T F (F))\F t } 2 ] ^ V&t(t f (F)) to deduce the desired bound. 
Intuitively, one has to exploit the fact that the mapping 1 1-> E[Z(1 — T F (F))\F t ] satisfies 
E\Z(\ — T F (F))\Fi] = 0, thus in principle compensating for the singularity at t « 1. 

As we will see below, one can make this heuristic precise provided there exist three 
constants c, S, r] > such that 

E[\t f (F)\ 2+ i} < oo and E[\E[Z(1 -T F {F))\F t }\] ^cr^l-tf, < t ^ 1. (1.16) 

Under the assumptions appearing in condition (|1.16[) . the following strategy can indeed 
be implemented in order to deduce a satisfactory bound. First split the integral in two 
parts: for every < e ^ 1, 

2D{F\\Z) 
^E[(l-r F (F)f] j a —t+l £ — E[ElZ(l-T F (F))\F t f}dt 

^E[(l-r F (F)) 2 ]\\ogs\+j' -L-E[E[Z(l-T F (F))\F t ] 2 ]dt 7 (1.17) 

the last inequality being a consequence of L e |^| = J ^ ~™' ^ J ^ = — loge. To 
deal with the second term in (|1.17|) . let us observe that, by using in particular the Holder 
inequality and the convexity of the function x t-t \x\ v+2 , one deduces from (|1.16|l that 

E[E[Z{\-T F {F))\F t ] 2 ] 



E 



\E[Z(l-T F (F))\F t ]\^T\E[Z(l-r F (F))\F t }\^ 



^ E[\E[Z(1 - r F (F))\F t }\] ** x E[\E[Z(1 - T F (F))|F t ]|"+ 2 ] ^ 
^c^rw(l-f)w x E[\Z\ r,+2 ]^ :T x £:[|l-r F (F)|' )+2 ]^ rr 
^c^t-^l-t)^ x 2 E[\Z\ , ' +2 ]^ T (1 + E[\T F (F)\ r > +2 ])^ ri 
= C n r 1 (l-t)^, (1.18) 



cj (1 - 


5t7 


_1 di 


Jl-e 






C,(i? + 1). 


- : ^+t 





with 

C n :=2c^ttE[\Z\" +2 ]^ t (l + J B[|r F (F)| r ' +2 ])^T. 

By virtue of (|1.17p and (|1.18|1 . the term D(F\\Z) is eventually amenable to analysis, and 
one obtains: 

2D(F\\Z)^E[(l-T F (F)) 2 }\\oge\ 

= E{(1 -r F (F)f }\\oge\ , 

07] 

Assuming finally that E[(l— Tp(F)) 2 ] «C 1 (recall that, in the applications we are interested 
in, such a quantity is meant to be close to 0) we can optimize over e and choose e = 

E[(l - r F (F)) 2 ]^, which leads to 

D(F\\Z) ^ n+1 E [(l r F {F)) 2 ] I log£[(l - r F {F)) 2 ]\ 

+ ^^E[(l-r F (F)) 2 }. (1.19) 

Clearly, combining (|1.19p with p.5p , one also obtains an estimate in total variation which 
agrees with (|1.7p up to the square root of a logarithmic factor. 

The problem is now how to identify sufficient conditions on the law of F for (|1.16p to 
hold; we shall address this issue by means of two auxiliary results. We start with a useful 
technical lemma, that has been suggested to us by Guillaume Poly. 

Lemma 1.5 Let X be an integrable random variable and let Y be a W 1 -valued random 
vector having an absolutely continuous distribution. Then 

E\E[X\Y]\ = sup E[Xg(Y)], (1.20) 

where the supremum is taken over all g £ C\ such that ||<7||oo 5; 1- 
Proof. Since |sign(-E[X|Y])| = 1 we have, by using e.g. Lusin's Theorem, 

E\E[X\Y}\= E[Xsign(E[X\Y])} sC sn P E(Xg(Y)). 
To see the reversed inequality, observe that, for any g bounded by 1, 

\E(Xg(Y))\ = \E(E(X\Y)g(Y))\ ^E\E[X\Y}\. 

The lemma is proved. ■ 

Our next statement relates (|1.16p to the problem of estimating the total variation distance 
between F and \/tF + \/l — tx for any x £ R and < t ^ 1 . 

Lemma 1.6 Assume that, for some k, a > 0, 

TV{VtF + VT~ r ix,F) ^K(l + \x\)t-\l-t) a , x£R, t £ (0, 1]. (1.21) 

Then il.16}) holds, with S = | A a and c = 4(k + 1). 



Proof. Take g <G C\ such that ||g||oo ^ 1- Then, by independence of Z and F, 
E [Z(l - T F (F))g(F t )} = E [g{F t )Z] - E [Zg(F t ) TF (F)} 



E [g(F t )Z] - y/T~tE [T F (F)g'(F t )} 



= E [Z(g(F t ) - g(F))} - J ?—±E [g(F t )F] 



so that, since WgW^ ^ 1 and E\F\ < ^fE[F 2 ] = 1, 



\l~r 



^\E[Z(g(F t )-g(F))]\+t- l VT=t. 
We have furthermore 

\E [Z(g(F t ) - g(F))] | = f xE[g(V~tF + s/T~tx) - g{F)]Mx)dx 

JR. 

< 2 / \x\ TV(VtF + y/l -tx,F)cj) 1 {x)dx 
Jr. 

^ 2nt- 1 (l - t) a J \x\(l + \x\)cj) 1 {x)dx 
Jr. 

^4k£ _1 (1 -t) a . 
Inequality (|1.16p now follows by applying Lemma 1 1.51 ■ 

As anticipated, in Section [4] (see Lemma l4~4l for a precise statement) we will describe 
a wide class of distributions satisfying (|1.21[) . The previous discussion yields finally the 
following statement, answering the original question of providing a bound on D(F\\Z) 
that is comparable with the estimate (|1.7|) . 

Theorem 1.7 Let F be a random variable with density f : R — ¥ [0, oo), satisfying E[F] = 
and E[F 2 ] = 1. Let Z ~ ~^(0, 1) be a standard Gaussian variable (independent of F). 
If, for some a, k, tj > 0, one has 

E[\t f (F)\ 2 ^] < oo (1.22) 

and 

TV(VtF + VT~^tx,F) ^K(l + \x\)t' 1 (l-t) a 1 igl,te(0,l], (1.23) 

then, provided A := E[(l - t f {F)) 2 ] ^ 1, 

D ^ Z >^T^, A ^ + W^ A - (L24) 

where 

C„ = 2(4k + 4)^E[\Z\ 11+2 ]^ (1 + E[\t f (F)\^+ 2 ])^. 

1.3 Plan 

The rest of the paper is organised as follows. In Section[5]we will prove that Theorem 1 1.71 
can be generalised to a fully multidimensional setting. Section contains some general 
results related to (infinite-dimensional) Gaussian stochastic analysis. Finally, in Section[5] 
we shall apply our estimates in order to deduce general bounds of the type appearing in 
Theorem 11.11 
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2 Entropy bounds via de Bruijn's identity and Stein 
matrices 

In Section 12.11 and Section 12.21 we discuss some preliminary notions related to the theory 
of information (definitions, notations and main properties). Section HOI contains the proof 
of a new integral formula, allowing one to represent the relative entropy of a given random 
vector in terms of a Stein matrix. The reader is referred to the monograph |19| , as well 
as to PQ Chapter 10], for any unexplained definition and result concerning information 
theory. 

2.1 Entropy 

Fix an integer d ^ 1. Throughout this section, we consider a <i-dimensional square- 
integrable and centered random vector F = {F\,...,Fd) with covariance matrix B > 0. 
We shall assume that the law of F admits a density f = f F (with respect to the Lebesgue 
measure) with support S C M. d . No other assumptions on the distribution of F will be 
needed. We recall that the differential entropy (or, simply, the entropy) oi F is given by the 
quantity Ent(F) := -E[logf(F)] = - / R- /(x)log/(x)dx = - f g /(x)log/(x)dx, where 
we have adopted (here and for the rest of the paper) the standard convention OlogO := 0. 
Note that Ent(F) = Ent(F + c) for all c £ M. d , i.e. entropy is location invariant. 

As discussed above, we are interested in estimating the distance between the law of 
F and the law of a d-dimensional centered Gaussian vector Z = (Z\, ..., Zd) ~ ^d(0, C), 
where C > is the associated covariance matrix. Our measure of the discrepancy be- 
tween the distributions of F and Z is the relative entropy (often called Kullback-Leibler 
divergence or information entropy) 

D{F\\Z) :=E[log(f(F)/^F))} = f /(x)log (M) dx, (2.25) 

where <fi = <fd{' ]C) is the density of Z. It is easy to compute the Gaussian entropy 
Ent(Z) = 1/2 log ((27re) d |C|) (where \C\ is the determinant of C), from which we deduce 
the following alternative expression for the relative entropy 

s^ D{F\\Z) = Ent(Z) - Ent(F) + ^^ f^~ d , (2.26) 

where 'tr' stands for the usual trace operator. If Z and F have the same covariance 
matrix then the relative entropy is simply the entropy gap between F and Z so that, 
in particular, one infers from (|2.26p that Z has maximal entropy among all absolutely 
continuous random vectors with covariance matrix C. 

We stress that the relative entropy D does not define a bona fide probability dis- 
tance (for absence of a triangle inequality, as well as for lack of symmetry): however, 
one can easily translate estimates on the relative entropy in terms of the total variation 
distance, using the already recalled Pinsker-Csiszar-Kullback inequality (|1.3p . In the next 
subsection, we show how one can represent the quantity D(F\\Z) as the integral of the 
standardized Fisher information of some adequate interpolation between F and Z. 

2.2 Fisher information and de Bruijn's identity 

Without loss of generality, we may assume for the rest of the paper that the vectors 
F and Z (as defined in the previous Section 12- If) are stochastically independent. For 
every t <G [0, 1], we define the centered random vector Ft := \/tF + ■</! — tZ, in such 
a way that F$ = Z and F\ = F. It is clear that F t is centered and has covariance 
r t = tB + (1 — t)C > 0; moreover, whenever t £ [0,1), Ft has a strictly positive and 
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infinitely differentiable density, that we shall denote by ft (see e.g. [2"Tl Lemma 3.1] for 
more details). For every t £ [0, 1), we define the score of F t as the Revalued function 
given by 

Pt : K d -> K d : x .-> p f (x) = (p t ,i(x), ..., p t , d (x)) T := Vlog/ 4 (x), (2.27) 

with V the usual gradient in M d (note that we will systematically regard the elements of 
M. d as column vectors) . The quantity p t (x) is of course well-defined for every x £ M. d and 
every t £ [0,1); moreover, it is easily seen that the random vector pt(F t ) is completely 
characterized (up to sets of P-measure zero) by the relation 

E[ Pt (F t )g(F t )} = -E[Vg(F t )}, (2.28) 

holding for every smooth function g : M. d — > M.. Selecting g = 1 in (|2.28p . one sees that 
Pt{F t ) is a centered random vector. The covariance matrix of pt(F t ) is denoted by 

J(F t ) := E[p t (F t )p t (F t ) T ] (2.29) 

(with components J(F t )ij = E [pt,i(Ft)pt,j(Ft)] for 1 < i,j < d), and is customarily called 
the Fisher information matrix of F t . Focussing on the case t = 0, one sees immediately 
that the Gaussian vector Fq = Z ~ ^7^(0,(7) has linear score function po(x) = Az(x) = 
— C _1 x and Fisher information J(-fo) = ^(^) = C~ l . 

Remark 2.1 Pia; t £ [0,1). Using formula Ii2.28\) one deduces that a version of p t (F t ) 
is given by the conditional expectation —(1 — t)~ 1 ' 2 E[C~ 1 Z\F t ], from which we infer that 
the matrix J(F t ) is well-defined and its entries are all finite. 

For t £ [0, 1), we define the standardized Fisher information matrix of F t as 

J st (F t ) := T t E \( P t(F t ) + r^ F t ) ( Pt {F t ) + T^F t ) T } = T t J(F t ) - I d , (2.30) 



where Id is the d x d identity matrix, and the last equality holds because E [pt(F t )F t ] = 
—Id- Note that the positive semidefinite matrix T^ 1 J st (F t ) — J(F t ) — T^ 1 is the differ- 
ence between the Fisher information matrix of F t and that of a Gaussian vector having 
distribution ^/(^(0, r t ). Observe that 



J st (F t ) := E [(pUF t ) + F t ) (pt(Ft) + Ft) 1 j I^ 1 , (2.31) 

where the vector 

Pt(Ft) = (pii(Ft),..,pU(Ft)) T ^TtPt(Ft) (2.32) 

is completely characterized (up to sets of P-measure 0) by the equation 

E[pt(F t )g(Ft)} = -T t E[Vg(F t )], (2.33) 

holding for every smooth test function g. 

Remark 2.2 Of course the above information theoretic quantities are not defined only 
for Gaussian mixtures of the form F t but more generally for any random vector satisfying 
the relevant assumptions (which are necessarily verified by the F t ). In particular, if F has 
covariance matrix B and differentiable density f then, letting Pf(x) :— Vlog/(x) be the 
score function for F, the standardized Fisher information of F is 

J st (F) = BE [ PF {F)p F {F) T ] . (2.34) 

In the event that the above be well-defined then it is also scale invariant in the sense that 
J st {aF) = J st (F) for all a £ K. 
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The following fundamental result is known as the (multidimensional) de Bruijn's iden- 
tity: it shows that the relative entropy D(F\\Z) can be represented in terms of the integral 
of the mapping t H> tr (CT^ 1 J st (F t )) with respect to the measure dt/2t on (0, 1]. It is one 
of the staples of the entire paper. We refer the reader e.g. to [TJ [5] for proofs in the case 
d — 1. Our multidimensional statement is a rescaling of J211 Theorem 2.3] (some more 
details are given in the proof). 

Lemma 2.3 (Multivariate de Bruijn's identity) Let the above notation and assump- 
tions prevail. Then, 

D{F\\Z) = J Itr (CT^J st (F t ))dt (2.35) 

+ \ (tr (C-'B) -d)+J o -tr (CT^ 1 - h) *. 

Proof. In |21l Theorem 2.3] it is proved that 

f°° 1 
D(F\\Z) = J -tr (C(B + rCyKlstiF + yfiFZ))) dr 

+ l(tr(C-iB)-d) + \J™tr(c((B + TCr 1 -^))dT 

(note that the definition of standardized Fisher information used in |21| is different from 
ours). The conclusion is obtained by using the change of variables t = (1 + t) -1 . as well 
as the fact that 



Jst If+J^-j^z) = J st (VtF + y/T=iZ), 

which follows from the scale-invariance of standardized Fisher information mentionned in 
Remark [ 



Remark 2.4 Assume that C n , n ^ 1, is a sequence of d x d nonsingular covariance 
matrices such that C n -ij — > Bij for every i,j = 1, ...,d, as n — > oo. Then, the second and 
third summands of (|2.35p (with C n replacing C ) converge to as n — > oo . 

For future reference, we will now rewrite formula (|2.35[) for some specific choices of d, 
F, B and C. 

Example 2.5 (i) Assume F ~ jV d (Q,B). Then, J s t{Ft) = (null matrix) for every 
t G [0,1), and formula i2.35\) becomes 

D(F\\Z) = \ (tr (C-'B) -d)+J ^tr (CT^ 1 - I d ) dt. (2.36) 

(ii) Assume that d = 1 and that F and Z have variances b,c > 0, respectively. Defining 
7i = tb + (1 — t)c, relation 112.35]) becomes 

1 c . ,„, . 1/6 \ f 1 1 f c 



D(F||Z) - L w, MF,)dt + 2 {" v + L » [t, ~ V M (2 - 37) 

1 c _j 2 1 fb \ log c — log b 



i^iMF^^F^dt + Kl-l 



Relation H2.37]) in the case b = c (= 74) corresponds to the integral formula (J1.12P 
proved by Barron in J31 Lemma Ij. 
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(in) If B = C, then i2.35\) takes the form 

D(F\\Z) = J ^ti (J st (F t ))dt. (2.38) 



In the special case where B = C = Id, one has that 



D(F\\Z) = ±J2[ 7 E [(PtAFt)+F ttj ) 2 ]dt, (2.39) 

A 1 " 



3=1 J ° 



of which (|1.12p is a particular case (d = I). 

In the univariate setting, the general variance case (E [F 2 ] = a 2 ) follows trivially 
from the standardized one (E [F 2 ] = 1) through scaling; the same cannot be said in 
the multivariate setting since the appealing form (|2.39[) cannot be directly achieved for 
d > 2 when the covariance matrices are not the identity because here the dependence 
structure of F needs to be taken into account. In Lemma 12.61 we provide an estimate 
allowing one to deal with this difficulty in the case B = C, for every d. The proof is based 
on the following elementary fact: if A, B are two d x d symmetric matrices, and if A is 
semi-positive definite, then 

A min (B) x tr(A) < tr(AB) ^ A max (B) x tr(A), (2.40) 

where A m j n (F) and A max (F) stand, respectively, for the maximum and minimum eigen- 
value of B. Observe that A max (F) = |jF|| op , the operator norm of B. 

Lemma 2.6 Fix d ^ 1, and assume that B = C. Then, CT^ = Id, and one has the 

following estimates 

d 
A m in(C) x ^E[(p t!J (F t ) + (C^Ft),) 2 } ^ tr(J st (F t )) (2.41) 



3 = 1 



^ A max (C) x J2 E l(PtAFt) + (C-'F^) 2 }, 



3 = 1 



XnuniC- 1 ) xJ^EKpl^F^ + Ftj) 2 } sC tx(J st (F t )) (2.42) 

3 = 1 

d 

^ A raax (C~ 1 ) xY,EM d (F t ) + F t!J ) 2 }. 

3 = 1 

Proof. Write tr (J st (F t )) = tr (C 1 J st (F t )C) and apply (|2^0l) first to A = C~ x J st (F t ) 
and B = C, and then to A = J st (F t )C and B = C _1 . ■ 

In the next section, we prove a new representation of the quantity pt(Ft) + C~ 1 F t in 
terms of Stein matrices: this connection will provide the ideal framework in order to deal 
with the normal approximation of general random vectors. 

2.3 Stein matrices and a key lemma 

The centered d-dimensional vectors F, Z are defined as in the previous section (in partic- 
ular, they are stochastically independent). 
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Definition 2.7 (Stein matrices) Let M(d,M) denote the space of d x d real matrices. 
We say that the matrix-valued mapping 



TF 



M(d,R) : x i-> r F (x) = {t^'(x) : i,j = l,...,d} 



such that both sides are well-defined: 



is a Stein matrix for F if Tp 3 (F) £ L 1 for every i,j and the following equality is verified 
for every differentiable function g : R d — > 

or, equivalently, 

d 



i = 1, ...,<i. 



(2.43) 



(2.44) 



The entries of the random matrix Tp(F) are called the Stein factors of F. 



Remark 2.8 (i) Selecting g(F) = Fj in ^2.43\ ), one deduces that, if Tp is a Stein 
matrix for F, then E[rp(F)] = C. More to this point, if F ~ jYd(Q,C), then the 
covariance matrix C is itself a Stein matrix for F. This last relation is known as 
the Stein's identity for the multivariate Gaussian distribution. 

(ii) Assume that d = 1 and that F has density f and variance b > 0. Then, under some 
standard regularity assumptions, it is easy to see that tf{x) = bj yf(y)dy/f(x) is 
a Stein factor for F. 

Lemma 2.9 (Key Lemma) Let the above notation and framework prevail, and assume 
that Tp is a Stein matrix for F such that r % p (F) £ L 1 (i7) for every i,j = 1, ...,d. Then, 
for every t G [0, 1), the mapping 



t 



X H^ -" 



-,E 



[h-C^TpiF^C^Z 



F 4 =x 



y/l-t 

is a version of the score p t of F t . Also, the mapping 
t 

XH^ ■ 

y/T=t 

is a version of the function p\ defined in formula \2. 32\ ) 



-cr 



F, 



V t C-h 



(2.45) 



(2.46) 



Proof. Remember that — C~ l Z is the score of Z, and denote byx n- A t (x) the mapping 
defined in (J2.45D . Removing the conditional expectation and exploiting the independence 
of F and Z ', we infer that, for every smooth test function g, 



E[A t (F t )g(F t )} 



-.E 



(l d -C- 1 T F (F))(-C- 1 Z)g(F t ) 



s/T^t 

- VtC-'E [Fg(F t )} - VT~tE [C- l Zg(F t )] 
= -tE[(l d -C- 1 T F (F))Wg(F t )] 

- tC- l E [T F (F)Vg(F t )} - (1 - t)E [Vg(F t )] 
= -E[Vg(F t )}, 

thus yielding the desired conclusion. ■ 

To simplify the forthcoming discussion, we shall use the shorthand notation: Z = 
(Z d ,...,Z d ) := C- l Z ~ ^(CC" 1 ), F = (F d ,...,F d ) := C^F, and t f = {t% : i,j = 
1, ..., d} := C~ 1 Tp. The following statement is the main achievement of the section, and 
is obtained by combining Lemma \2. 91 with formulae (|2.38[) and (|2.4ip - (|2.42p . in the case 
where C = B. 
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Theorem 2.10 Let the above notation and assumptions prevail, assume that B = C , and 
introduce the notation 



Ai(F;Z):=- 



f 



2./ : -t" 



-E 



E* 



3=1 Lfc=i 



J2(h=k-rp k (F))Z k 



Ft 



dt. 



A 2 (F;Z):=- 



2./0 : -*~ 



T/ien owe /ias the inequalities 



-E 



E* 



3=1 |_fc=i 



^(cO',fc)-4- fe (F))z fc 



F 



dt 



A min (C) x A 1 (F;Z) < £>(F||Z) sC A max (C) x Ai(F;Z), 
AminlC" 1 ) x A 2 (F;Z) < F(F||Z) < A max (C'- 1 ) x A 2 (F;Z). 



In particular, when C = B = F 



D ™ = \f a i^ t B 



d 

E* 



^(l i=fe -r^(F))^ 



fc=i 



F 



dt. 



(2.47) 
(2.48) 



(2.49) 
(2.50) 



(2.51) 



The next subsection focusses on general bounds based on the estimates (|2.47p - (|2.51D . 

2.4 A general bound 

The following statement provides the announced multidimensional generalisation of The- 
orem [TT7J In particular, the main estimate (|2.55p provides an explicit quantitative coun- 
terpart to the heuristic fact that, if there exists a Stein matrix Tp such that \\rp — C\\h.s. 
is small (with || • \\h.s. denoting the usual Hilbert-Schmidt norm), then the distribution 
of F and Z ~ .yV d (0,C) must be close. By virtue of Theorem 12.101 the proximity of the 
two distributions is expressed in terms of the relative entropy D(F\\Z). 

Theorem 2.11 Let F be a centered and square integrable random vector with density 
f : M. d — > [0,oo), let C > be its covariance matrix, and assume that tf is a Stein matrix 
for F . Let Z ~ ^/^(O, C) be a Gaussian random vector independent of F. If, for some 
K, rj > and a € (0, s] > one has 



E[\rp k (F)\^+ 2 ] < 00, j,k = l,...,d, 
as well as 

tv(v*f + v 7 ! - Ix, f) < «(i + ||x||i)(l - t) a , xeM^te [1/2, 1] 

then, provided 



(2.52) 



(2.53) 



A := E[\\C r F f H , s ] =Y,E (C(j, k) rp k (F) 



^ 2"^T. 



has 



d{n + VXravciC- 1 ) 



D(F\\Z) < " Vl ' ^""""^ — L ma x E[{Zi) 2 ] x A|logA| 



+ 



2an i<i<d 

gd,„ : r^+l)A ma x(C' 1 ) 

2ai] 



A, 



(2.54) 



(2.55) 
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where 



C dtVlT := 2d 2 2kE[\\Z\\ 1 (1 + ||Z||i)] + /maxC(j,j) max ^[|Z / ]"+ 2 ]^ 



x; (ic(j,fc)r 2 +^[ir^(F)i 



l<i<rf 



f+21 



J'.fe=l 



(2.56) 



and ('as above) Z = C l Z . 



Proof. Take g <G C\ such that ||<?||oo ^ 1- Then, by independence of Z and F and using 
(|2.44p . one has, for any j = 1, . . . , d, 



E 



J2(C(j,k)-4 k (F))Z k g(F t ) 
.fe=i 

d 

Y, C&QC-^k^EMgiFt)]- J2 C- 1 (k,l)E[rp k (F)Z l g(F t ) 

k,l=l fe,J=l 

d 

£[Z i5 (F t )]-VT3t g C- 1 (fc,/)C(/,m)^[r^ fe (F)9 m5 (F t ) 

fc,/,m— 1 

E \Z, (g(F t ) g(F))} y/T^tJ^E [r£* (F)d k g(F t ) 



fc=i 



= E [Zj (g(F t ) - g(F))) - \J^E [F j9 (F t )} 

Using (|2.53p . we have 

\E[Z 3 {g{Ft)-9{F))]\ 

Xj E[g{VtF + v/l-tx) - .a(F)] 



e _i (c - 1 x,x) 



(27r)2VdcTC 



dx 



^2/ \x J \TV(VtF + VT~ r ix,F) 



e ^n 



(27r)2VdeTC 



dx 



«;2 K (i-i) Q / Wia+iixiix) 

= 2«E[||Z||i(l + ||Z||i)](l-t) a . 



^TT^VdetC 



: dx 



As a result, due to Lemma Tl. 5 1 and since E\Fj\ ^ ^ E[F^\ ^ ^/maxj C(j,j), one obtains 

^(C(j,fc)-r^(F))Z fc 



max F 



E 



Lfc=l 



F 



( 2«£?[||Z||i(l ll-ZHO] + JmaxC{j,j) (1 - t) Q . 
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Now, using among others the Holder inequality and the convexity of the function x *— > 
\x\ v+2 , we have that 



d 

E* 



E 



fc=i 



F 



E^ 



E 



,fc=i 



^E^ 



E(^'' fc )-^ fe (^))^ 

d 

^(C(i,fc)-Ti' fe (F))^ 
=1 

d 

^(C(j,fc)-rf(F))Z, 
fc=i 

Y,(C(j,k)-4 k (F))Z, 



F 



E 



>l I 3 
1+1 



s 



x £ 



k=l 



E 



F 



J2(C(j,k)-4 k (F))Z k 



,fe=i 



F, 



r;+2' 



<C d ,, )T (l-t)?R 



(2.57) 



with Cd, v ,T given by (|2.56[) . At this stage, we shall use the key identity (|2.50p . To properly 
control the right-hand-side of (|2.48|i . we split the integral in two parts: for every < e ^ i, 



"max \yj 



^E 

3 = 1 



-D{F\\Z) 

f d 

J2(C(j,k)-4\F))Z k 



\k=l 



1-e 



tdt 
l-t 



E 



J^(C(j,k)-4 k (F))Z h 



< 



J 1-e j=1 



r 
dm^EKZt) 2 ] Y^ E (c(j,k)-4 k (F) 



.fe=i 



Ft 



dt 



j,k=l 

+ C d , n . T I {l-t)^T- l dt 

1-e 



^ d max E (Zi) 2 } V E 



log el 



J,ki 



i ^ n + 1 _sa_ 
\oge\ + Cd.mr- e*>+K 

(XT) 



C{j,k)-r>/{F) j 
the second inequality being a consequence of (|2.57|) as well as J Q £ |^| = f - ~^' 



< 



J — = — logs. Since (|2.54p holds true, one can optimize over e and choose e = A °"> , 
which leads to the desired estimate (|2.55[) . ■ 



Before applying the content of Theorem 12.111 to entropic CLTs on a Gaussian space 
(see Section^), we devote the forthcoming Section [3] to some preliminary results about 
Gaussian stochastic analysis. 
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3 Gaussian spaces and variational calculus 

As announced, we shall now focus on random variables that can be written as functionals 
of a countable collection of independent and identically distributed Gaussian -/K(0, 1) 
random variables, that we shall denote by 



G = 



{Gi-.i^l}. (3.58) 



Note that our description of G is equivalent to saying that G is a Gaussian sequence 
such that E[Gi] = for every i and E[GiGj} = l{i =J }. We will write L 2 {a(G)) := 
L 2 (P, c(G)) to indicate the class of square-integrable (real-valued) random variables that 
are measurable with respect to the er-field generated by G. 

The reader is referred e.g. to [331138] for any unexplained definition or result appearing 
in the subsequent subsections. 

3.1 Wiener chaos 

We will now briefly introduce the notion of Wiener chaos. 
Definition 3.1 (Hermite polynomials and Wiener chaos) 

1. The sequence of Hermite polynomials {H m : m ^ 0} is defined as follows: Hq = 1, 
and, for m ^ 1, 

x 2 d m x 2 
H m (x) = (-l) m e-- — e"-, x&R. 
dx m 

It is a standard result that the sequence {{m\)~ 1 ' 2 H m : m Js 0} is an orthonormal 
basis of L (R, <f>i(x)dx). 

2. A multi-index a = {cti : i Js 1} is a sequence of nonnegative integers such that on ^ 
only for a finite number of indices i. We use the symbol A in order to indicate the 
collection of all multi-indices, and use the notation \a\ = X)i>i a i' f or ever y ol € A. 

3. For every integer q ^ 0, the qth Wiener chaos associated with G is defined as follows: 
Cq = R, and, for q ^ 1, C q is the L 2 {P) -closed vector space generated by random 
variables of the type 

oo 

$(a) = Y[H ai (Gi), aeAand\a\=q. (3.59) 

; = 1 



It is easily seen that two random variables belonging to Wiener chaoses of different 
orders are orthogonal in L 2 (a(G)). Moreover, since linear combinations of polynomials 
are dense in L 2 (a(G)), one has that L 2 (a(G)) = ©„> C q , that is, any square-integrable 
functional of G can be written as an infinite sum, converging in L 2 and such that the gth 
summand is an element of C q . This orthogonal decomposition of L 2 (a(G)) is customarily 
called the Wiener-Ito chaotic decomposition oi L 2 (a (G)). 

It is often convenient to encode random variables in the spaces C q by means of in- 
creasing tensor powers of Hilbert spaces. To do this, introduce an (arbitrary) separable 
real Hilbert space S) having an orthonormal basis {e.{ : i ^ 1}. For q ^ 2, denote by S)® q 
(resp. F) &q ) the qth tensor power (resp. symmetric tensor power) of Sj; write moreover 
f)®° = jj 00 = K and ij® 1 = f) 01 = $). With every multi-index a € A, we associate the 
tensor e(a) £ ,f)®H given by 

e(a) = e n <g> ■ • ■ e ik k , 
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where {a^, ...,aii k } are the non-zero elements of a. We also denote by e(a) <E f) l a l 
the canonical symmetrization of e(a). It is well-known that, for every q Jj 2, the set 
{e(a) : a <E A, |a| = q} is a complete orthogonal system in f) 09 . For every <? ^ 1 and 
every h £ .f) 09 of the form h = X^qpa \ a \=a c a&(&)i we define 

i q (h)= y, c « $ («)' ( 3 - 6 °) 

q€A, \a\— q 

where $(a) is given in (|3.59p . Another classical result (see e.g. [551 [35]) is that, for 
every q ^ 1, the mapping 7 g : fj 09 — > C 9 (as defined in (|3.60[) ) is onto, and defines an 
isomorphism between C q and the Hilbert space fj 09 , endowed with the modified norm 
\/ql\\ ■ ||i5®3. This means that, for every h, hi £ fj 09 , E[I q (h)I q (h')] = q\(h,h')^n, q . 

Finally, we observe that one can reexpress the Wiener-Ito chaotic decomposition of 
L 2 (cr(G)) as follows: every F € L 2 (a(G)) admits a unique decomposition of the type 

oo 

F = J2l q (h q ), (3.61) 

q=0 

where the series converges in L 2 (G), the symmetric kernels h q £ f) 09 , q ^ 1, are uniquely 
determined by F, and Io(ho) :~ E[F]. This also implies that 

oo 

E[F 2 ] = E[F] 2 + J2Q l -\\h q \\W 

9=1 

3.2 The language of Malliavin calculus: chaoses as eigenspaces 

We let the previous notation and assumptions prevail: in particular, we shall fix for the 
rest of the section a real separable Hilbert space .f), and represent the elements of the gth 
Wiener chaos of G in the form (|3.60p . In addition to the previously introduced notation, 
L 2 (f)) := L 2 (er(G);f)) indicates the space of all f)-valued random elements u, that are 
measurable with respect to cx(G) and verify the relation -E7 [||zt||^J < oo. Note that, as it 
is customary, f) is endowed with the Borel er-field associated with the distance on $j given 
by (hi, h^) i-> \\hi - Ms- 
Let 5^ be the set of all smooth cylindrical random variables of the form 

F = g(l 1 (hi),...,h(h n )), 

where n ^ 1, g : W l — > M. is a smooth function with compact support and hi € Sj. The 
Malliavin derivative of F (with respect to G) is the element of L 2 (S)) defined as 

n 

DF = Y,d l g(h(hi),...,h(h n ))h l . 

By iteration, one can define the mth derivative D m F (which is an element of L 2 (.f) 0m )) 
for every m ^ 2. For m ^ 1, D m ' 2 denotes the closure of S? with respect to the norm 
II ' \\m 2j defined by the relation 



\f\\1, 2 = eif 2 } + J2e[\\d*f 



"LH^ r US' 



It is a standard result that a random variable F as in (|3.6ip is in B m ' 2 if and only 
if J2 q >i 1 m< l^\\fq\\%siq < °°; from which one deduces that Q) q k=0 Ck £ B m ' 2 for every 
q, m ^ 1. Also, Dli(h) = h for every h £ S). The Malliavin derivative D satisfies the 
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following chain rule: if g : M. n — ► K is continuously differentiable and has bounded partial 
derivatives, and if (Fi, ...,F n ) is a vector of elements of D 1,2 , then g(Fi, . . . , F n ) £ D 1 ' 2 
and 

n 

Dg{F u ...,F n )=Yl ^(Fi, ■■■, F n )DF. (3.62) 

»=i 

In what follows, we denote by 8 the adjoint of the operator D, also called the divergence 
operator. A random element u £ L 2 ($i) belongs to the domain of 8, written Dom(5, if and 
only if it satisfies 



\E [(DF, u)si] I < c u VE[F2] for any F £ S>, 

for some constant c u depending only on u. If u £ Dom 5, then the random variable S(u) 
is defined by the duality relationship (customarily called "integration by parts formula") : 

E[F8(u)]=E[(DF,v)si], (3.63) 

which holds for every F £ D 1,2 . A crucial object for our discussion is the Ornstein- 
Uhlenbeck semigroup associated with G. 

Definition 3.2 (Ornstein-Uhlenbeck semigroup) Let G' be an independent copy of 
G, and denote by E' the mathematical expectation with respect to G'. For every t Js 
the operator P t : L 2 (a(G)) -> L 2 (a(G)) is defined as follows: for every F(G) £ L 2 (a(G)), 



P t F(G) = E'[F{e- l G + ^1 - e~ 2t G% 

in such a way that P F(G) = F(G) and PooF{G) = ^[^(G)]. The collection {P t : 
t ^ 0} verifies the semigroup property PtP s = Pt+s and is called the Ornstein-Uhlenbeck 
semigroup associated with G. 

The properties of the semigroup {Pt : t Js 0} that are relevant for our study are 
gathered together in the next statement. 

Proposition 3.3 1. For every t > 0, the eigenspaces of the operator P t coincide with 
the Wiener chaoses C q , q = 0, 1, ..., the eigenvalue of C q being given by the positive 
constant e~ qt . 

2. The infinitesimal generator of {P t : t Js 0}, denoted by L, acts on square-integrable 
random variables as follows: a random variable F with the form US. 61]) is in the 
domain of L, written DomL, if and only if^2 Q>1 ql q (h q ) is convergent in L 2 (a(G)), 
and in this case 

LF=-Y^qI q (h q ). 

In particular, each Wiener chaos C q is an eigenspace of L, with eigenvalue equal to 
-Q- 

3. The operator L verifies the following properties: (i) Dom L = B 2 ' 2 , and (ii) a random 
variable F is in DomL if and only if F £ Dom SD (i.e. F £ D ' and DF £ Dom<5j, 
and in this case one has that 8(DF) = —LF. 

In view of the previous statement, it is immediate to describe the pseudo-inverse of 
L, denoted by L~ , as follows: for every mean zero random variable F = X)o>i ^9(^9) °f 
£ 2 (er(G)), one has that 



L-'F = J2--I q (h q ). 



9=31 * 
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It is clear that L l is an operator with values in B 2 ' 2 . 

For future reference, we record the following estimate involving random variables living 
in a finite sum of Wiener chaoses: it is a direct consequence of the hypercontractivity of 
the Ornstein-Uhlenbeck semigroup - see e.g. in [33J Theorem 2.7.2 and Theorem 2.8.12]. 

Proposition 3.4 (Hypercontractivity) Let q > 1 and 1 ^ s < t < oo. Then, there 
exists a finite constant c(s,t,q) < oo such that, for every F G (§)1 = o Cfc> 

E[\F\ t ]^c(s,t,q)E[\F\°} 1 s. (3.64) 

In particular, all L p norms, p Js 1, are equivalent on a finite sum of Wiener chaoses. 

Since we will systematically work on a fixed sum of Wiener chaoses, we will not need 
to specify the explicit value of the constant c(s,t,q). See again [33] , and the references 
therein, for more details. 

Example 3.5 Let W = {Wt : t ^ 0} be a standard Brownian motion, let {ej : j ^ 1} 
be an orthonormal basis of L 2 (R + , ii?(IR + ) , fit) =: L 2 (R + ), and define Gj = L ej(t)dWt- 
Then, a(W) = <r(G), where G = {Gj : j ^ 1}. In this case, the natural choice of a 
Hilbert space is Sj = L 2 (R+) and one has the following explicit characterisation of the 
Wiener chaoses associated with W : for every q ^ 1, one has that F 6 C q if and only if 
there exists a symmetric kernel f € L 2 (K^_) such that 

F = q\ / ••■/ f(t 1 ,...,t q )dW tq ---dW tl :=q\J q (f). 

Ja Jo Jo 

The random variable J q (f) is called the multiple Wiener-Ito integral of order q, of f with 
respect to W . It is a well-known fact that, if F £ D 1 ' 2 admits the chaotic expansion 
F = E[F] + X)o>i Jq(fq)> then DF equals the random function 



,t q - 1 )dW tq _ 1 ---dW tl , t€ 



which is a well-defined element of L 2 ($)). 

3.3 The role of Malliavin and Stein matrices 

Given a vector F = (Fi, ..., Fd) whose elements are in D 1,2 , we define the Malliavin matrix 
T(F) = {T i!J (F):i,j = l,...,d}as 

r iJ (F) = (DF i ,DF j ) si . 

The following statement is taken from [3T] , and provides a simple necessary and sufficient 
condition for a random vector living in a finite sum of Wiener chaoses to have a density. 

Theorem 3.6 Fixd,q ^ 1 and let F = (Fi, ...,Fd) be such that Ft £ ©i =0 ^' * = -*■> — >d. 
Then, the distribution of F admits a density f with respect to the Lebesgue measure on M. 
if and only if E[dctT(F)] > 0. Moreover, if this condition is verified one has necessarily 
thatEnt(F) < oo. 



->5> J «-i (/«(*>•)) 




9^1 




>x Jo Jo Jo 


2 /(Mi 
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Proof. The equivalence between the existence of a density and the fact that E[det T(F)] > 
is shown in [3T} Theorem 3.1]. Moreover, by [22 Theorem 4.2] we have in this case that 
the density / satisfies / £ lL>i L p (M. d ). Relying on the inequality 

logu^ n(u 1/n - 1), u > 0, neN 
(which is a direct consequence of log u ^ u — 1 for any u > 0), one has that 

I /(x)log/(x)dx<n( / /(x) 1+ ^x-lY 
Hence, by choosing rt large enough so that / £ L l+ ~ (M. d ), one obtains that Ent(F) < oo. 



To conclude, we present a result providing an explicit representation for Stein matrices 
associated with random vectors in the domain of D. 

Proposition 3.7 (Representation of Stein matrices) Fixd ^ 1 and let F = (Fi, ...,Fd) 
be a centered random, vector whose elements are in ED 1 ' 2 . Then, a Stein matrix for F (see 
DefJnition \2.1\) is given by 

t 1 f J {x)=E[{-DL- 1 F 1i DF ] )^\F = x\, i,j = l,...,d. 

Proof. Let g : R d ->• E £ C\. For every i = 1, ...,d, one has that F, = ~SDL- 1 F i . As 
a consequence, using (in order) (|3.63|) and (|3.62[) , one infers that 

d 

E[F l9 (F)} - EK-DL^F^DgiF ))*] = ^ E^-DL^Fi, DF,) n d j9 {F)}. 

3 = 1 

Taking conditional expectations yields the desired conclusion. ■ 

The next section contains the statements and proofs of our main bounds on a Gaussian 
space. 

4 Entropic fourth moment bounds on a Gaussian space 

4.1 Main results 

We let the framework of the previous section prevail: in particular, G is a sequence of i.i.d. 
>yf (0, 1) random variables, and the sequence of Wiener chaoses {C q : g > 1} associated 
with G is encoded by means of increasing tensor powers of a fixed real separable Hilbert 
space Sj. We will use the following notation: given a sequence of centered and square- 
integrable d-dimensional random vectors F n = (Fi tU , ...,F^ n ) £ B 1 ' 2 with covariance 
matrix C n , n ^ 1, we let 

A n := E[\\F n \\ 4 ] - E[\\Z n \\% (4.65) 

where || • || is the Euclidian norm on R d and Z n ~ <Ad(0, C n ). 

Our main result is the following entropic central limit theorem for sequences of chaotic 
random variables. 

Theorem 4.1 (Entropic CLTs on Wiener chaos) Let d ^ 1 and qi,...,qd ^ 1 be 

fixed integers. Consider vectors 

F n = {Fl.n, ■ ■ ■ ,Fd,n) = (I gi (/ll,n), ■ ■ ■ , Iq d {hd,n)), Tl ^ 1, 
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with hi >n € Sj &qi . Let C' n denote the covariance matrix of F n and let Z n ~ -v)^(0, C n ) be a 
centered Gaussian random vector in W l with the same covariance matrix as F n . Assume 
that C n — > C > and A n — > 0, as n — > oo. Then, the random vector F n admits a density 
for n large enough, and 

D(F„||^„) = 0(l)A„|logA n | asn^oo, (4.66) 

where 0(1) indicates a bounded numerical sequence depending on d, q\, ..., qa, as well as 
on the sequence {F n }. 

One immediate consequence of the previous statement is the following characterisation 
of entropic CLTs on a finite sum of Wiener chaoses. 

Corollary 4.2 Let the sequence {F n } be as in the statement of Theorem \4-l[ and assume 
that C n — > C > 0. Then, the following three assertions are equivalent, as n — > oo: 

(i) A„ -> ; 

(ii) F n converges in distribution to Z ~ </^(0, C); 

(iii) D(F n \\Z n ) -> 0. 

The proofs of the previous results are based on two technical lemmas that are the 
object of the next section. 

4.2 Estimates based on the Carbery- Wright inequalities 

We start with a generalisation of the inequality proved by Carbery and Wright in |llj . 
Recall that, in a form that is adapted to our framework, the main finding of |11| reads as 
follows: there is a universal constant c > such that, for any polynomial Q : W l — >• R of 
degree at most d and any a > we have 

E[Q(X u ...,X n ) 2 }^ P(\Q(X u ...,X n )\^a)^cda*, (4.67) 

where X\, . . . , X n are independent random variables with common distribution ^V(0, 1). 

Lemma 4.3 Fix d, q\, . . . , qd ^ 1, and let F = (F\, . . . , Fd) be a random vector such that 
Fi = I qi (hi) with hi G S) Qqi . Let T = T(F) denote the Malliavin matrix of F, and assume 
that E[dctT] > (which is equivalent to assuming that F has a density by Theorem \3.6\) . 
Set N = 2d(q — 1) with q = maxi^j^<j qt . Then, there exists a universal constant c > 
such that 

P(detr s^ A) s^ c7VA 1/Ar (£:[detr])" 1 / Ar . (4.68) 

Proof. Let {e^ : i ^ 1} be an orthonormal basis of f). Since dct T is a polynomial of degree 
d in the entries of T and because each entry of T belongs to (B fc ^ Cfc by the product 
formula for multiple integrals (see, e.g., [331 Chapter 2]), we have, by iterating the product 
formula, that det T 6 ®k = ^Ck- Thus, there exists a sequence {Q n ,n ^ 1} of real- valued 
polynomials of degree at most TV such that the random variables Q n (Ii(ei), . . . ,Ii(e n )) 
converge in I? and almost surely to det r as n tends to infinity (see [3S1 proof of Theorem 
3.1] for an explicit construction). Assume now that 75[detr] > 0. Then, for n sufficiently 
large, E[\Q n (Ii(ei), . . . ,ii(e n ))|] > 0. We deduce from the estimate (|4.67[) the existence 
of a universal constant c > such that, for any n ^ 1, 

P(|Q„(I 1 (ei), . . . ,/i(e„)| < A) < cN\ l ' N (E[Q n {H{ei), . . . , h{e n ) 2 })-^ 2N . 

Using the property 

E[Q„(/i(ei), . . . , Ii(e„) 2 ] > (E\\Q n (h(ei), . . . ,h{e n )\]) 2 
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we obtain 

P(|Q„(/i(ei), • . • ,/i(e„)| < A) < dVA^C^IQnC/iCei), . . . , h{e n ))\])-^ N , 

from which (|4.68[) follows by letting n tend to infinity. ■ 

The next statement, whose proof follows the same lines than that of |311 Theorem 4.1], 
provides an upper bound on the total variation distance between the distribution of F 
and that of \JtF+ \/l - tx, for every x £ R d and every t £ [1/2; 1]. Although Lemma 14^41 
is, in principle, very close to [5TI Theorem 4.1], we detail its proof because, here, we need 
to keep track of the way the constants behave with respect to x. 

Lemma 4.4 Fix d, qi, . . . ,qd *s 1, and let F = (Fi,...,Fd) be a random vector as in 
Lemma \4-3\ Set q = maxi<jj^d qi. Let C be the covariance matrix of F, let T = T(F) 
denote the Malliavin matrix of F , and assume that (3 := _E[dctT] > 0. Then, there exists 
a constant c qd u c u HS > (depending only on q, d and \\C\\h.s. ~~ with a continuous 
dependence in the last parameter) such that, for any x £ M. d and t £ [i, 1], 

TV(VtF+VT~^t^F) < c qMC \\ H . s , (/T^tt a l) (l + ||x||i) (l-t)"i>»H)Wi)+>. (4.69) 

Here, N = 2d(q-l). 

Proof. The proof is divided into several steps. In what follows, we fix t £ [J, 1] and x £ M. d 
and we use the convention that C{.j denotes a constant in (0,oo) that only depends on 
the arguments inside the bracket and whose value is allowed to change from one line to 
another. 



Step 1. One has that £[||F||y < c d E[\\F\\ 2 2 ] sC Q^y=iC(M) 2 < c d \\C\\ H .s. so 
that P[||F|| 



< Cd\f\\C\\ H .s. = Cd,\\c\\ H .8. Let 9 
bounded by 1. We can write, for any M ^ 1, 

\E[g(VtF + VT~t X )} - E[g(F)}\ 



be a (smooth) test function 



< E (5l[-A//2,A//2]<0(V*P + \/l - *x) - E [(ffl[_M/2,M/2]' 

+ P(\\VtF + v / T^xj| 00 > M/2) + Pfll-FHoo 5s M/2) 

< sup \E[ct)(ViF + VT^x)] -E[(f>(F)}\ 

11*11 oo«I 

supp0c[-M,Af] d 



)(P)] 



< sup |£[0(VtF + vT^fx)] - g[0(P)]| + Cd ^ c l HS - (1 + ||x||oo). (4.70) 

supp0c[-M,M] d 

S"iep 2. Let : R d -*■ E be C°° with compact support in [-M, Af] d and satisfying 
\\4>\\oo ^ 1- Let < a ^ 1 and let p ; R d -» K + be in C;?° and satisfying L d p(x)dx = 1. 
Set p a (x) = -^xp{— )■ By |36l formula (3.26)], we have that <j> * p a is Lipschitz continuous 
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with constant 1/a. We can thus write, 
\E[<j>{VtF + VT~t*)} - F[0(F)]| 



< 



< 



E 



E 



4> * Pa {VtF + VT~tx) - * p a (F)] I + \E [0(F) - * Pa (F)}\ 
<f>(V~tF + v/1 - ix) -(f)* p a (VtF + v/l-tx) 



yT^I 



SfllFHoo] + HxHoo) + |F [0(F) - * p Q (F)] 



0(\/tF + \/l - ix) - * p a {VtF + y/l-tx.) 



< c d,\\C\\ H . s . ^^(1 + llxlU) + |F [0(F) - * p Q (F)]| 



F 



0(VtF + \/l - ix) - * p a {VtF + \/l-tx) 



(4.71) 



In order to estimate the two last terms in (|4.7ip . we decompose the expectation into two 
parts using the identity 



1 = 



t d det r 



t d det T + e t d det T + t 
Step 3. For all e, A > and using $Mfr 

E 



e > 0. 



<F 



i d det T + e 
e 



t d det T + e 






-{dotr>A} 
1/JV 



cN 



A 



F[dctT] 



1/AT 



Choosing A = e N + 1 f3 N + 1 yields 



F 



_t d detr- 
As a consequence, 



*:(i- d + C A0 (- 



^ Cn.d -Z 



--q.d 



(3 



(recall that i ^ h] 



E 



4>(VtF + v/1 - ix) - * p a (VtF + Vl - ix) 



F 



0(v^F + Vl - ix) - * p a (VtF + \/l - ix) 
£ i rf det T 



i d det T 



i d det T 



S c g , d i j 



+ 



F 



(0(V*F + VT - Ix) - * p Q (^F + x/T - Ix)) 



t d det r 

i d det T + £ 



(4.72) 



Step 4- In this step we recall from [311 page 11] the following integration by parts formula 
(|4.73p . Let h : R d — >• R be a function in C°° with compact support, and consider a 



26 



random variable W £ D°°. Consider the Poisson kernel in M. d , denned as the solution to 
the equation AQ d = Sq. We know that Q%(x) = C2 log \x\ and Qd(%) = c^ |o;| 2 ~ d for d 7^ 2. 
We have the following identity 



E[W det T h(VtF + VT^x)] 



1 d 



MW) / h(y)d,Q d {VtF + ^/T~t*-y)dy 



(4.73) 



where 



4(W0 = -J2 ((■D(W(Adjr) 0ii ),Di ?, „) J 5 + (Adjr) a>i WX.F ), 

a=l 

with Adj the usual adjugate matrix operator. 

Step 5. Let us apply the identity (|4.73[) to the function h = <f> — (f>* p a and to the random 

1 

t d dct r+e 

i d det T 



variable W = W e = utt^f , _ £ ^>°°', we obtain 



E 



<j) * Pa ){V~tF + VT^tx) 



t d det T + e 



t d -^ E 



A t (W e ) / ( ( j>^^*p a ){y)d i Q d (VtF+VT~t^~y)dy 



(4.74) 



From the hypercontractivity property together with the equality 
d ( 

MWe) = E \ ~ tddetr + £ «^( Ad J r )<M^ F <^ - (Adjr) a>i LF a ) 

a— 1 1^ 

r (AdjT) a<i (I>(dfitr),DF ) J sl 1 



(t d detT + e) 2 
one immediately deduces the existence of c q ^A\c\\ H s > such that 

SL4.TO 2 ] < c qMC \\ H . s . e- A . 

On the other hand, in [2U page 13] it is shown that there exists c d > such that, for any 
R > and u £ R d , 



/ {4>-4>*p a ){y)diQd{-u-y)dy 

JR d 



^ c d (R + a + aR~ d (||u||i + M)° 



Substituting this estimate into (|4.74[) and assuming that M ^ 1, yields 

t d det r 



E 



•*p a )(V~tF + y/T~tx) 



t d dct T + e 



< c ^,l|c|| H . s . £ -2 (i? + a + aR- d (M + ||x||i) d ) . 
Choosing i? = a 3 +T(7\/ -f ||x|| 1)^+1 and assuming a ^ 1, we obtain 

i d det T 



E 



-<j)*p a )(ViF + y/l-tx) 



t d dct T + £ 



< 



c q,d,\\c\\„.s. £ 2 a"+ 1 (V\/ +||x||i)"+i 



(4.75) 



27 



It is worthwhile noting that the inequality (|4.75p is valid for any t £ [i, 1], in particular 
for t = 1. 



Step 6. From (|4TTTjl . P~72)) and (|4T?5)l we obtain 
\E[<j>(>/iF + y/T=1x)] - E[tj>{F)]\ 



y/1 — t i 1, n /'s 

< c d| ||c|| H . s . - (l + ||x||i) + c g , d I - 

+ C qMC \\ H , s . E- 2 a^FT(M + ||x||i)3TT. 



By plugging this inequality into (|4.70p we thus obtain that, for every M ^ 1, e > and 
0<a< 1: 

tv(Af + VT^x, F) 



< c d ,||c|| H . s . — ^— (! + H x lli) + c i-d (j ) 

+ ^^(1 + 11x110 



M 



^c 



g^HC|| H . s .(l + l|x||i)(/3 ^iAl) 



i \ Jl-t _j_ a^M 1 

he-w H — 



M 



(4.76) 



1 W+l (2W + 4)(d+l) 

Choosing M — e «+i , e = a^w+^+i) and a = (1 - £)2«2iv + 4)(d+i)+i) ; one obtains the 



desired conclusion (|4.69[) . 



4.3 Proof of Theorem 14.11 

In the proof of |37l Theorem 4.3], the following two facts have been shown: 



E 



C n (j,k) (DF jin ,DF ktn )x 

1j 



^Cov(F 3 2 :n ,Fl n )-2C n (j,kY 

d 



E^F.,^] -E\\\Z n f] = J2 {Cov(Fl n: Fl n )-2C n (3,ky}. 



As a consequence, one deduces that 
d 



E* 

j;k=l 



C n (j,k)-—{DF jtn ,DF kin )si 
1i 



€A r 



Using Proposition 13.71 one infers immediately that 

1 

<U 
defines a Stein's matrix for F n , which moreover satisfies the relation 



r£ fe (x) := -E[{DF jtn ,DF k}n )x \F n = x], j, k = 1, . . . , d, 



a r , 

Y^E (c(j,k)-4*(F n )y 



3,k=l 



< A n . 



(4.77) 



(4.78) 



Now let r n denote the Malliavin matrix of F n . Thanks to [391 Lemma 6], we know that, 
for any i,j = l,..., d, 



(DFi, n ,DFj, n }si — ► y/QiQjC(i,j) asn^oo. 



(4.79) 
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Since (DFi^ n ,DFj^ n )^ lives in a finite sum of chaoses (see e.g. |331 Chapter 5]) and 
is bounded in L 2 (cr(G)), we can again apply the hypercontractive estimate (|3.64[) to 
deduce that (DF,^ n , DFj >n )^ is actually bounded in L p (cr(G)) for every p ^ 1, so that the 
convergence in (|4.79[) is in the sense of any of the spaces L p (a(G)). As a consequence, 
S[detr n ] — » detCJ| i _ 1 qi —: 7 > 0, and there exists no large enough so that 

inf E[detT n ] > 0. 

We are now able to deduce from Lemma 14.41 the existence of two constants k > and 
a £ (0, ^] such that, for all x £ R d , t £ [|, 1] and n ^ no, 

TV(VtF n + VT~t X ,F n ) < k(1 + 11x110(1 - t) a . 

This means that relation (|2.53|) is satisfied uniformly on n. Concerning (|2.52p . again by 
hypercontractivity and using the representation (|4.77|) . one has that, for all r\ > 0, 



sup£[|T^ fc 0F n )r +2 ] <oo, j,k = l,...,d. 



Finally, since A„ — > and because (|4.78[) holds true, the condition (|2.54|) is satisfied for 
n large enough. The proof of (|4.66[) is concluded by applying Theorem 12.111 

4.4 Proof of Corollary 14.21 

In view of Theorem 14. 1[ one has only to prove that (b) implies (a). This is an immediate 
consequence of the fact that the covariance C n converges to C , and that the sequence 
{F n } lives in a finite sum of Wiener chaoses. Indeed, by virtue of (|3.64p one has that 

sup£[||F„|H<cx), Vj)>l, 

yielding in particular that, if F n converges in distribution to Z, then i?||F„|| 4 -> i?||Z|j 4 
or, equivalently, A„ — > 0. The proof is concluded. 
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